Securing Machine Learning Classifiers with Input Hashing Re-weight Strategy
نویسنده
چکیده
Adversarial machine learning resides at the intersection of machine learning and computer security. Originally, machine learning techniques were designed for environments that do not assume the presence of an adversary. However, in the presence of intelligent adversaries, this working hypothesis is likely to be violated to at least to some degree, depending on the skillset of an adversary. A skilful adversary can carefully manipulate the input data exploiting specific vulnerabilities of learning algorithms. This results in misclassification of malicious instances, which may compromise the whole system security. For example, by carefully modifying values of features with largest weight without changing the outcome of malicious packet, an adversary may trick an intrusion detection system to allow malicious packet into the network. Solutions presented in research studies by other authors consider the classifier protection using re-weight strategies; typically, this results in compromise between accuracy and robustness. Unlike those, the research presented in this paper deals with a re-weight strategy based on hashing all the numeric features without classification accuracy degradation. System becomes robust as feature weights are even and avalanche effect makes virtually impossible for an attacker to modify the input data and trick the learner into misclassification. Research hypotheses are experimentally validated on custom intrusion detection dataset consisting of numeric features.
منابع مشابه
Augmented hashing for semi-supervised scenarios
Hashing methods for fast approximate nearest-neighbor search are getting more and more attention with the excessive growth of the available data today. Embedding the points into the Hamming space is an important question of the hashing process. Analogously to machine learning there exist unsupervised, supervised and semi-supervised hashing methods. In this paper we propose a generic procedure t...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملTask-driven Greedy Learning of Feature Hashing Functions
Randomly hashing multiple features into one aggregated feature is routinely used in largescale machine learning tasks to both increase speed and decrease memory requirements, with little or no sacrifice in performance. In this paper we investigate whether using a learned (instead of a random) hashing function improves performance. We show experimentally that with increasing difference between t...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملTo Re(label), or Not To Re(label)
One of the most popular uses of crowdsourcing is to provide training data for supervised machine learning algorithms. Since human annotators often make errors, requesters commonly ask multiple workers to label each example. But is this strategy always the most cost effective use of crowdsourced workers? We argue “No” — often classifiers can achieve higher accuracies when trained with noisy “uni...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017